Searching for targets on a model DNA: 
Effects of inter-segment hopping, detachment and re-attachment 
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For most of the important processes in DNA metabolism, a protein has to reach a specific binding 
site on the DNA. The specific binding site may consist of just a few base pairs while the DNA is 
usually several millions of base pairs long. How does the protein search for the target site? What 
is the most efficient mechanism for a successful search? Motivated by these fundamental questions 
on intracellular biological processes, we have developed a model for searching a specific site on a 
model DNA by a single protein. We have made a comparative quantitative study of the efficiencies 
of sliding, inter-segmental hoppings and detachment/re-attachments of the particle during its search 
for the specific site on the DNA. We also introduce some new quantitative measures of efficiency of 
a search process by defining a relevant quantity, which can be measured in in-vitro experiments. 

PACS numbers: 87.16.af, 87.10.Rt 
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I. INTRODUCTION 

Self-avoiding walk (SAW) on a lattice serves as a 
paradigm for research in statistical properties of natu- 
ral and artificial polymers [l| . A "bridge" is defined as a 
bond on the lattice that connects two sites both of which 
are located on the SAW and are nearest-neighbours on 
the lattice but are not nearest neighbours along the con- 
tour of the SAW. RWs on SAWs is an interesting problem 
in its own right because of the interesting effects of the 
hops of the random walker across the bridges. Many 
years ago, motivated by the vibrational dynamics of pro- 
teins, the root-mean-square displacement of the random 
walker on a SAW was studied both in the absence and 
presence of hops across bridges RW on SAW has 

also been studied as one of the prototypes of RW in dis- 
ordered and fractal media |9l-[ll|. 

In this paper we report the effects of the hops of the 
random walker across the bridges on the distributions 
of their first passage times, (FPT) [l^, i.e., the time 
taken by the walker to reach a target site for the first 
time. Moreover, we extend the model even further by al- 
lowing the possibility of detachments and re-attachments 
(to be described in detail in section HV)) : we also report 
the effects of these processes of attachments/detachments 
of the random walkers on the distributions of their first 
passage times. This extension of the model and the com- 
putation of the first passage times are motivated by a 
biological process which is discussed in the next section. 
Therefore, this work may also be viewed as a biologically 
motivated extension of the works reported earlier. 2-8] . 

This paper is organised as follows: In section |TT1 we 
discuss the biological motivation behind this problem. 
In section lllli we review some of the earlier works and 
compare our model to the previous models. In section 
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IIV[ we build the model and, thereafter in section |Vl we 
discuss the results. 



II. BIOLOGICAL MOTIVATION 

A cell is the structural and functional unit of a living 
system. DNA, the device used by nature for storage of ge- 
netic information, is essentially a linear polymer. The ge- 
netic information is chemically encoded in the sequence of 
the nucleotides, the monomeric subunits of DNA. Some 
viruses use RNA, instead of DNA, for storage of genetic 
information. In almost all processes involved in the nu- 
cleic acid (DNA or RNA) metabolism, specific proteins 
(or, more generally, macromolecular complexes) need to 
bind to specific sites on the nucleic acid. For example, a 
transcription factor must bind at the appropriate site on 
the DNA to initiate the process of transcription whereby 
genetic code is transcribed from the DNA to the corre- 
sponding RNA. Similarly, the processes of DNA repli- 
cation, repair and recombination also require binding of 
the corresponding appropriate proteins at specific sites 
on the DNA. Other processes of similar nature include 
restriction and modification of DNA by sequence-specific 
endonucleases. The typical length of a DNA chain could 
be millions of base-pairs, whereas the target site may be 
a sequence of just a few basepairs. But, a protein usu- 
ally succeeds in reaching the target in an unbelievably 
short time. One of the most challenging open questions 
in molecular cell biology and biophysics is: how does a 
protein search such a long strand of DNA in an efficient 
manner to reach the target site? 

To our knowledge, this question was first formulated 
clearly by Von Hippel and coworkers [ij, who also 
pointed out three possible mechanisms of search for 
the specific binding sites by the DNA-binding proteins. 
These three possible modes of search are as follows: 
(i) The protein slides diffusively along an effectively one- 
dimensional track formed by covalently-bonded bases of 
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the DNA template, 

(ii) it not only slides along the DNA chain but, occa- 
sionally, also hops from one segment of the DNA to 
a neighbouring segment; proteins with more than one 
DNA-binding sites can exploit this mechanism, 

(iii) in addition to sliding and intersegmental hopping, it 
also carries out a three-dimensional search for the specific 
binding site by first detaching from the DNA strand and, 
then, after executing three-dimensional diffusion in the 
solution, re-attaching at a new site which is uncorrelated 
with the site from which it detached (see FigH]). 
Various aspects of these mechanisms and their relative 
importance have been explored by many research groups 
in subsequent works (see next section for a brief review 
and comparison to our model). [iB-H^l- 



III. BRIEF REVIEW OF EARLIER MODELS 

Bustamante et al. (l6| showed experimental evidence 
of the intersegmental transfer and hopping movements of 
E. Coli RNA Polymerase (RNAP) on nonspecific DNA. 
They also showed the effect of Heparin, which disrupts 
the RNAP-DNA nonspecific complexes. (For a theoreti- 
cal review of this phenomenon, see [l7l[.) 

Burdzy and Holyst [ll] address an important question, 
namely the number of molecules needed to locate the tar- 
get of a given size. However, the theoretical arguments 
are not supported by any simulations. Also, the argu- 
ments are not in terms of FPTs, which may be more 
relevant biologically in the given context. 

The effect of sequential inhomogeneity of the DNA was 
taken into consideration by Slutsky et al. [2l|. They 
however focussed only on a combination of one and three 
dimensional search mechanisms, without focusing on the 
Intersegmental transfers. Also, they modeled the DNA 
as a one-dimensional strand, which is not completely re- 
alistic in the biological context. 

The DNA was modeled as a one-dimensional strip con- 
sisting of low and high affinity sites by Rezania et al. [s^ . 
They also took a two dimensional strip which, in addi- 
tion to the above mentioned sites, has zero affinity water. 
However, they did not investigate the role of the bridges 
explicitly in their simulations. 

The model developed by Oshanin et al. [1^ is similar 
to our model, in that the search is carried out in discrete 
time steps till a maximum of N steps, until the immobile 
target is found. The survival probability is found in terms 
of the leakage probability and is optimized to minimize 
this probability. However, the calculations are done for a 
one-dimensional substrate, which may not be biologically 
realistic. 

Recently, Sheinman et al. [s^l studied the effect of 
intersegmental transfers on the search process. The DNA 
was however modeled by connecting an ideal gas of rods 
(of unit persistence length) randomly to form a small 
world network. The authors reported a decrease in the 
search time by using scaling arguments and numerical 



verification. They also found dependence on the length 
of the DNA, an aspect which we do not address in great 
detail here. 

Therefore, in spite of the large attention that this prob- 
lem has received recently, the role of all three mechanisms 
and, in particular, the role of intersegmental transfer to- 
gether with the attachment/detachment have not been 
investigated thoroughly. In this paper, we study all the 
three mechanisms together, which complements some of 
the works which have been reported earlier for elucidat- 
ing the relative importance of each. 



Promoter 




FIG. 1: A pictorial depiction of the various mechanisms of 
searching for specific binding sites by a DNA-binding protein 
(e.g., searching of the promoter site by a transcription factor). 



IV. THE MODEL 

A DNA can be considered to be a freely jointed chain 
over length scales much longer than its persistence length. 
A freely jointed chain can be modeled using a SAW 
where the length of each of the steps of the SAW is typ- 
ically of the order of the persistence length. The per- 
sistence length of DNA is roughly 100 base-pairs (bps). 
Therefore, a SAW of total length L = 100 would cor- 
respond, approximately, to 10,000 base pairs which is 
comparable, for example, to the length of a bacteriophage 
DNA. 
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Motivated by the experimental and theoretical works 
summarized in sections |lT] and IIIIl in this paper we ex- 
plore the efficiency of searching the SAW by a random 
walker for a specific binding site on the SAW. We study 
the efficiency of various search mechanisms that the ran- 
dom walker may use in order to reach the target site. 
We have introduced a new quantitative measure of the 
efficiencies of the search mechanisms in terms of the time- 
scales that are relevant to this problem. 
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FIG. 2: A pictorial depiction of a Self Avoiding Walk (SAW) 
generated on a square lattice. A-f->B-f->-C represents sliding. 
A-f-^D represents hopping across a bridge. A path from A to 
F which does not consist entirely of sliding along the con- 
tour and hopping across bridges is called jumping (It would 
consist of atleast one detachment from the SAW and its re- 
attachment). The probabilities associated with hopping from 
one site to the other depends on the local conformation of the 
SAW. The boundaries of the underlying lattice are far away 
from the SAW. 

For the sake of simplicity, we consider SAWs in two- 
dimensions, rather than three-dimensions. The random 
walker is represented by a particle. The particle searches 
the binding site by a combination of sliding, intersegment 
hopping as well as detachments, two-dimensional diffu- 
sion followed by, possibly, re-attachments (see Figl2|). 
Sliding motion of the particle is captured by its one- 
dimensional RW where its position at the successive time 
steps are nearest-neighbours along the contour of the 
SAW. In contrast, an inter-segment hopping of the parti- 
cle takes place across a "bridge" that connects two sites 
both of which are located on the SAW and are nearest- 
neighbours on the square lattice but are not nearest 
neighbours along the contour of the SAW. Finally, upon 
detachment from the SAW, a particle executes an unbi- 
ased RW on the square lattice and, during this process, 
may re-attach with the SAW if it hops onto a site occu- 
pied by the SAW. 

In our model we generate SAW configurations, each 
of length L = 101, on a square lattice (FigH)) using a 



combination of reptation and the kink jump algorithms 
[ssl ]. Averaging over the configurations thus generated, 
we have verified that the mean-square end-to-end Eu- 
clidean distance of the SAWs satisfy the well known 
relation < r\ > ocL"^/^. When the random walker 
was constrained to move only along the SAW, it per- 
formed, effectively, one-dimensional diffusion. We can 
determine the value of the effective diffusion constant 
D, where D ^< > /(2t), < i?? > being the mean 
square displacement along the contour of the SAW. We 
have also verified that the mean-square Euclidean dis- 
placement of the random walker, on the SAW, follows 
< Rlit) >cx t^/-*, even when hopping across the bridges 
are allowed. This is in agreement with the results re- 
ported earlier 0, Q . 

V. RESULTS AND DISCUSSION 

We parametrize the positions along the contour of the 
SAW by the symbol s; s = 1 and s — L correspond to 
the two end points on the SAW. We designate the two 
end points, i.e., s = 1 and s = L as the specific bind- 
ing sites for the particle. On each SAW of length L, we 
release a particle at the mid-point of the SAW (i.e., at 
s = {L + l)/2) and allow it to execute a RW for a total of 
N discrete time steps. If the particle is unable to reach 
either of the target sites (i.e., s = 1 or s = L), then the 
search by that particle is aborted and the search by an- 
other particle starts again. N is 5000 and L is 101 in all 
our simulations. In three different sets of computer ex- 
periments we implemented three different types of RWs 
of the particle. 

(i) Mechanism I (M I) : The particle is allowed to perform 
random walk only along the contour of the SAW. 

(ii) Mechanism II (M II): Hopping across the bridges is 
allowed, in addition to the process included in mecha- 
nism I [33] ■ 

(iii) Mechanism III (M III): Attachment and detachment 
of the particle are also allowed, in addition to the pro- 
cesses included in mechanism II fs^ . 

For the random walkers, we impose absorbing bound- 
ary conditions at s = 1 and s — L, i.e. a succesful search 
process is terminated once the walkers reach the target 
site for the first time. Under these boundary conditions, 
the time taken by a random walker to reach one of the 
two boundaries (i.e., s = 1 or s = L) is identified as the 
corresponding FPT. 



A. Distributions of First Passage Times(FPTs) 

The distribution P(t) of the FPTs for the three mech- 
anisms are plotted in FigO Since all three mechanisms 
are based on diffusive search, the qualitative shape of 
the curve P{t) is the same in all the three cases. But, 
comparing the most probable time for three mechanisms, 
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FIG. 3: The distribution of the FPTs for (a) Mechanism I, 
(b) Mechanism II, and (c) Mechanism III. (See Appendix I 
for fit parameters). 



B. Relative importance of 
detachments / re-attachments 

In order to compare the relative importance of 
detachment/re- attachment compared to sliding and 
inter-segment hopping, we have computed the fraction of 
the time steps the particle spends unattached with the 
SAW in each successful search process. Corresponding 
to every search time, i, we compute the fraction of the 
search time that the particle spends unattached from the 
SAW. We plot this fraction as a function of the search 
time in Fig|3] . Note that the peak in FigU] occurs at 
t = 191. Interestingly, this value is close to the most 
probable FPT in FiglJl corresponding to Mechanism III, 
namely t — 153. Thus, the target site is reached in the 
shortest possible time if the particle uses mechanism III, 
in which the searching particle spends a fraction of the 
search time outside the SAW. 




1000 1500 2000 2500 3000 
Time 

FIG. 4: The fraction of the time steps during which the par- 
ticle remains unattached from the SAW before reaching the 
target binding site in t steps. 

We have also computed the probability of re- 
attachment of a particle after t time steps, following its 
detachment from the SAW; this probability distribution 
is shown in FiglS] The log-log plot in the inset indicates 
the possibility of an initial power law regime, which is 
most likely ~ t~^^'^, crossing over to another power law 
regime at long times, which was found to be ^ t^'^/^. 



C. Mechanism I versus Mechanism II 



we conclude that the mechanism II is more efficient than 
mechanism I whereas mechanism III is the most efficient 
of all. This observation strongly suggests that the search 
for DNA-binding sites by proteins would be more efficient 
if, in addition to sliding, both inter-segment hopping and 
detachment/re-attachment are also allowed. 



In this subsection, we consider a modified version of 
Mechanism II (MM II) which reduces to the mechanism 
I in a special limit. In this modified version, we compute 
the effect of forced hopping across the bridges, with a 
given probability. We define a quantity R as follows, 

_ Pbridge 
Pcontour 

Pbridge ~t~ Pcontour — 1 (2) 



5 



O 
Q. 




200 400 600 800 1000 
t 



FIG. 5: The re-attachment probabihty in Mechanism III. The 
inset shows the same data on a log scale 



where ptridge is the probabihty of hopping across the 
bridge and Pcontour is the probabihty of diffusing along 
contour. In the hmit Pbridge = (i.e., R = 0), this modi- 
fied version reduces to mechanism I. 

In Fig.®, we plot the distribution P{t) of the FPTs 
for four different values of R. 

A higher value of R indicates a higher probability of 
hopping across a bridge. This gives rise to a higher prob- 
ability of reaching the ends in roughly the same amount 
of time. Therefore, if the protein has some bio-chemical 
means of hopping across such bridges preferentially, then 
it can bind to the specific binding site in a more efficient 
manner. 

However, as we see from FiglHl for an extremely high 
value of i?, the walker tends to get trapped in the bridge 
and hence takes a longer time to reach the ends. For 
example, when R = 100, ptridge ~ 0.99. For this value 
of Pbridge ^ the moment the walker encounters a bridge, it 
would tend to get trapped in a bridge between two sites 
(for example, the bridge connecting "A" and "D" in Fig. 



D. Quantitative estimates of efficiencies of 
search-times 



We are now in a position to compare the values of the 
most probable time, r^p, and the MFPT {ravg) for the 
distribution of the FPTs of all the mechanisms that we 
have investigated till now. Let tio be the most probable/ 
MFPT for successful search using Mechanism I while r be 
the corresponding most probable/ MFPT for the specific 
mechanism under consideration. 
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FIG. 6: The distribution of the FPTs for the modified Mech- 
anism II for (a) R=0.1, (b) R=l, (c) R=10, and (d) R=100. 
To each curve, we fit a Gamma Distribution and a Difference 
of Exponentials (see Appendix I) . 



We define 



77= |1- 



TlD 



(3) 



which we use as a quantitative measure of the efficiency of 
the process, relative to purely one-dimensional diffusion. 
The data are summarised in the table below. 



Mechanism 


Most Probable 
Search Time (rmp) 




Mean Search 
Time (ravg) 




Mechanism I 


841 





1931.6 





Mechanism II 


494 


0.41 


1419.8 


0.27 


Mechanism III 


153 


0.82 


1346.5 


0.30 


MM II (R=0.1) 


115 


0.86 


1243.9 


0.36 


MM II (R=l) 


117 


0.86 


553.7 


0.71 


MM II (R=10) 


112 


0.87 


598.3 


0.69 


MM II (R=100) 


457 


0.46 


1243.9 


0.36 



We conclude that among the possible mechanisms con- 
sidered in this paper, the modified Mechanism II with 
R — 10 turns out to be the most efficient search process, 
as far as rjmp is concerned. However, in terms of rjavg, 
R = 1 would be the most efficient search mechanism. 
Therefore, we conjecture that if both rjmp and rjavg play 
equally important roles in determining the efficiency of a 
given mechanism, then the most efficient search mecha- 
nism would correspond to the range 1 < -R < 10. 



Mechanism 


a ~ 




M I 


1.483 


0.001 


(fc ^ 266.725) 






M II 


1.836 


0.001 


MM II (R=0.1) 


1.146 


0.001 


MM II (R=l) 


1.442 


0.003 


MM II (R=10) 


1.346 


0.003 


MM II (R=100) 


1.461 


0.001 
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FIG. 7: (a.) The Tmp as a function of R. (b.) The Tavg as a 
function of R. (See Appendix I for fit parameters) 



Mechanism 


c ~ 


d ^ 






M I 


-0.001 


0.003 


-0.001 


0.001 


M II 


0.001 


0.001 


0.001 


0.004 


MM II (R=0.1) 


0.100 


0.001 


0.099 


0.001 


MM II (R=l) 


-0.003 


0.023 


-0.003 


0.003 


MM II (R=10) 


-0.002 


0.026 


-0.002 


0.002 


MM II (R=100) 


0.996 


0.002 


0.996 


0.002 



We also observe another interesting feature in MM II. 
We note that Pmax = P{Tmp) is largest for i? = 1 and 
then for R — 10. This implies that not only are they 
the most efficient of all the mechanisms considered, but 
that they also have the highest "success-rate" of reach- 
ing the taget sites. Therefore, we see that MM II with 
forced hopping across the bridges leads to the most effi- 
cient and successful search process. Whether all proteins 
with multiple DNA-binding sites actually make use of 
this mechanism to reach their target sites is something 
that needs to be tested experimentally under controlled 
conditions in the near future. 



In Fig. [71 we plot the most probable search time and 
the mean first passage time as functions of R. We do not 
show the point i? = (in the limit i? — !■ 0, we recover 
Tmp = 841 and Tavg = 1931.6) on the log-scale. The 
only quantitative difference between the two is that the 
turning point in the curve for the MFPT lies in the range 
0.1 < i? < 1, whereas, the turning point in the curve for 
the Tmp lies in the range < i? < 0.1. 



E. Multiple Walkers and Immovable Barriers 

We have also investigated the search of the same bind- 
ing sites simultaneously by N (> 1) interacting parti- 
cles which are initially distributed randomly along the 
SAW. The positions of the particles are updated in par- 
allel subject to the constraint that none of the lattice 
sites is occupied by more than one walker at a time. As 
is suggested by our intuition, the < R^ > decreases with 
an increasing number of random walkers. In case of pure 
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sliding, the interaction between the particles, effectively, 
constrains each one to a shorter region on the SAW. Con- 
sequently, < i?f > decreases with increasing N. How- 
ever, in the presence of Bridges, there arise some situa- 
tions in which a particle can bypass the other particles on 
its way by hopping across the bridges and, thereby, in- 
creasing < i?f >. Effects of mutual hindrance is further 
weakened by detachments/re- attachment processes. 

We also considered the situation when there are im- 
movable barriers placed randomly along the SAW. This 
could mimic the effect of various obstacles that are 
present in-vivo in the crowded environment of the cell. 
The mechanism III is the most efficient search process in 
the presence of these barriers. 

VI. SUMMARY AND CONCLUSION 

In this paper, we have suggested a biologically moti- 
vated extension of random walk on self-avoiding walks. 
The results of this investigation provide insight into the 
relative importance of different mechanisms of search for 
specific binding on DNA by DNA-binding proteins. We 
studied the effect of preferential bias to hop across the 
bridges in the intersegmental transfer and found that for 
1 < i? < 10, the mechanism II turns out to be most 
efficient. Whether this is the mechanism that proteins 
actually use in order to find the target sites can be veri- 
fied only by doing controlled experiments. 

We also suggest experiments that can be performed to 
test the efficiency of the various search processes. The 
value of Tir) can be taken as an input from standard 
known results. The value of r^p and Tavg can be mea- 
sured using Fluoroscence Spectroscopy. The experimen- 
tally obtained 77 can then be compared with the above 
mentioned results , obtained using simulations to throw 
light on the possible mechanism that the protein uses to 
search for its target site. 
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Appendix A 

In this appendix, we analyze the FPT distributions ob- 
tained for all the mechanisms, quantitatively. We know 
that the Gamma distribution (CD) is one of the most 
appropriate forms for modelling waiting time distribu- 
tions and other similar phenomena. We fit all our FPT 
distributions (apart from M I) using a two parameter 
GD: T{t) = 6T~ie~''7r(a), where T{a) is the gamma- 
function of a, while a and b are parameters to be fitted us- 
ing least squares regression. For Mechanism I, we fit the 
FPT distribution to = - /c)°-ie-''(*-'=Vr(a), 

where k is also a parameter to be fitted using least squares 
regression. 

We observe that the data for the FPT distribution fits 
equally well to the difference of two exponentials. We fit 
the distributions to a four parameter function as follows 
: J-{t) = ce^''* — /e~^*, where c, d, f and h are the 
parameters to be fitted using least squares regression. 
Both the GD and the difference of exponential fits to the 
FPT distribution of Mechanism III were poor, and hence 
not shown in the figure. 

We have listed the fit parameters in the tables in Sec- 
tioifVDl 

In FiglXl we plot the Tmp and Tavg as functions of 
R. We find T,np = ae^^ , where a « 108.343 and 
b w 0.014. On the other hand, Tavg = ce'^^ + fe^^ , where 
c « 550.342, d « 0.008, / w 2180.460 and h w -11.462. 
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